Computer implemented masked representation of data tables

ABSTRACT

In the computer software field, method and apparatus to obfuscate (mask or hide) computer data which is part of or accessed by a computer program. The method protects (hides) accesses to tables of data in terms of the place or position of each element in the table. It does this by providing an intermediate table which describes the positions of the elements of the first table or tables, but in a transformed (modified) fashion.

FIELD OF THE INVENTION

This disclosure describes a way to mask (hide or obfuscate) computerdata and code against reverse engineering attacks.

BACKGROUND

Nowadays, there are more and more uses of computer software applications(programs) where the user may also be interested in learning informationabout the software code which is executed on his computing device. Thisis, for instance, the case with DRM (Digital Rights Management)applications, used to protect songs, games, applications, or any digitalcontent, against fraud (such illegal copying, spreading content over P2Pnetworks, etc.). Such a user could be interested in (illegally) tryingto copy songs, applications or games in order to redistribute them.

Of course, the role of the content distributor or owner is to protectthe content against such malicious users (“hackers”). Various knownmeans are used to achieve this goal. One is code obfuscation. Codeobfuscation is a well known technique where source code in a computerprogramming language is made difficult to understand.

SUMMARY

As explained above, there is a known need for code obfuscation toprotect content against illegal or unauthorized uses. The presentinventors have recognized that also object (compiled) code tables ofdata (also called arrays in the code) have to be protected.

The first need is to protect the data content itself, especially forthose data tables containing critical information. This is mostlyachieved currently by the use of masks and various other processes. Justas an illustrative example, consider a table with data. Instead ofstoring data as it, it is stored using a masking function. Theassociated unmasking process is only done when a given variable of thetable has to be used, in a first known solution. In a second knownsolution, the data is used as masked/protected but the process isadapted to include this kind of mask.

In all the cases, retrieving the original data reverses the process.Sometimes also performing a complementary dynamic analysis speeds up theprocess. This means accessing the code at run time (execution time)since the unmasking operation is only done upon execution.

The second need, and this is a goal of this disclosure, is to protectthe table accesses themselves. When a table of data is involved in aprocess, this protects the data contained inside this table, and alsothe place/position in the table of each data item or element. Thisapproach can also be combined with and is not exclusive to methodsinvolving the masking of the data itself.

Hence contemplated in accordance with the invention is the method forobfuscating the code, the associated computer software tool that doesthe obfuscating and is embodied in a computer program product stored incomputer memory, the associated programmed computing apparatus that doesthe obfuscating, and the resulting obfuscated code embodied in acomputer program product stored in computer readable memory. Alsocontemplated is the inverse method of executing the obfuscated code soas to de-obfuscate it, the associated programmed computing apparatus,and the resulting de-obfuscated code embodied in a computer programproduct stored in a computer readable memory.

BRIEF DESCRIPTION OF THE FIGURES

FIGS. 1A, 1B show use of the present invention.

FIG. 2 shows a master table in accordance with the invention.

FIG. 3 shows a conventional computing device used with the invention.

DETAILED DESCRIPTION

Consider a table of data designated T having a number designated tLen ofentries of any data type. There are various ways to access the table.The first and the most simple is to use the table as T directly. Thisoperation is specified in the C computer language as: Table[position] toaccess the next position (entry). The second one is to define a memorypointer at a given position of T (address of T). Let ptrT designate thismemory pointer. In a non-protected implementation, the x+1 element(entry) of T is accessed by reading T[x] (which is equivalent toptrT+x).

In accordance with the invention, an intermediate table is providedwhich stores the positions of the data elements which need to be usedfor each data table. This is the principle of modified indexes. Considerthe previously described data table T; and denote f as the function usedto “shuffle” (transform) the accesses/positions of the table elements.This function has to be invertible and is used to create a bijectionbetween two groups of tLen elements (where as stated above tLen is thenumber of the elements in the table). Denote flnv as the inversefunction to function f. Table T is transformed (shuffled) into fT withthe use of the pre-defined f function.

Accessing the element numbered a+1 inside the data table T is done as(again expressed logically):

b=T[a]

b(where b is not the (a+1) element if table T is shuffled.)

But in accordance with the invention, this operation is replaced by:

b=fInv(a)

As seen, there is no particular difficulty in carrying this out incomputer software and the resulting change inside the source code isminimal. The advantage is the protection of the data element against astatic analysis by an attacker. When the present pointer principle isused in a sub-function where T is defined (using ptrT instead of Tdirectly), this attack technique is much harder because the startingposition (address in memory) of the data table is unknown.

The length of the table is unknown in the sub-function and the startingpoint (address) is also not known. This is however necessary to accessthe table since the pointer just gives an address in the (logical)memory. Therefore the present approach masks a data table veryefficiently and is able to manage pointers.

Given all the data tables (where table also refers here to data arrays)T in a particular piece of software, at the compilation time of thesource code of the piece of software, through a software “tool”(program), one generates a table of masks which will later contain thestarting addresses of each of the tables T, the length tLen of each ofthe tables and a description of the functions f used to mask theaddresses for each table T. Denote this table of masks as masterTable;it is depicted graphically in terms of its organization in FIG. 2. Notethat the addresses are not available at compilation time; they areloaded into the tables at run time. So the memory room for masterTableis allocated at (code) compilation time; post-compilation which isbefore or during code run (execution) time, the table entries are usedto identify the pointers.

This process 10 of obfuscating the code is shown in FIG. 1A, beginningwith conventional source code 12 for a particular application, coded inany convenient programming language. The source code 12 is applied tothe present obfuscating tool 16, resulting in the obfuscated source code18. This source code 18 is then subject to the conventional compiler(each non-interpreted computer programming language conventionally hasits own compiler.) The result is the masked (obfuscated) object code 20.Of course this process of FIG. 1A takes place in a computer withrelevant code and data being stored in computer memory and the tool 16being itself executed by the computer processor. The tool can beintegrated into the compiler. The tool can also produce a result inanother programming language.

The software tool 16 processes the source code 12 to be masked in thefollowing manner. When a table pointer requiring obfuscation is detectedin the original (unmasked) source code 12 by the tool 16 (via specificannotations provided or present for instance in the original sourcecode), the tool 16 modifies the occurrence of the pointer in the sourcecode 12 with a call to a software handler function. (Handlers are wellknown generally; they are asynchronous and generic callbacksubroutines.) The table of masks denoted masterTable is then alsoupdated if needed (one needs to update the masterTable each time apointer to a new buffer—a memory location—is detected).

As shown in FIG. 1A, at the time of (otherwise conventional) compilationof the source code into object (compiled) code, the tool translates theraw information provided from the source code files (pointers, arrays,etc.) into obfuscated pointer information. Two cases are thus possible:direct pointer references to arrays/tables, or absolute pointers. Ineach of these two cases, the tool statically parses the source codesearching for pointers to obfuscate. Then if a buffer is already filled,the tool transforms it using one f transformation function as describedabove. (There can be as many f functions as the number of transformedtables.)

During the later execution of the obfuscated (masked) software compiledin this way and now in object code (compiled) form, the executionprocess 30 shown in FIG. 1B is as follows. When a pointer has to beused, the handler present in the compiled (object) code 20 is executedat 36 and actually resolves the obfuscated pointers by accessingmasterTable, the table of masks. MasterTable is allocated at the time ofcompilation and filled post-compilation by the tool as explained above.The table entry information (which includes an address and length of thetable for that pointer) is then used to identify the buffer which “owns”the pointer. From this point, the f function corresponding to the bufferis fetched from the table entry and the original offset value of thepointer is thereby retrieved to restore the unmasked data at 38. Notethat when the bijection referred to above is used, its image function isknown. This image is included in the code or retrieved from masterTableat execution time.

Given a data table T (with the corresponding masking information storedin the masterTable), consider a pointer pointing to a given position ofthe original table T: ptrT. Instead of using this pointer prtT directly,the masterTable is accessed. Using ptrT and the associated table Tlength (tLen) recovered from the masterTable, it is possible to definein which element in which table T pointer ptrT+x is pointing to. Then,using the associated function f also recovered from masterTable, b=T[a]can be replaced to access to the correct position of the shuffled table.

Consider an example in the following data table T in which table T is a10-Byte long table, starting in the offset value 0x1234 (first entry,top row). This means the table ends at offset value 0x123E. Let the ffunction (for transforming the pointer indexes) be the multiplication by3 modulo the length of the table T. Function flnv is therefore definedas the multiplication by 7 modulo 10. This means the second element ofthe table (at 0x1235, second element top row) is stored at 0x1237(second element, bottom row). This table T illustrates this, by showingthe link between the original starting address (top row) and themodified starting address (bottom row). Note that this table does notrepresent the values stored at each address:

0x1234 0x1235 0x1236 0x1237 0x1238 0x1239 0x123A 0x123B 0x123C 0x123D0x1234 0x1237 0x123A 0x123D 0x1236 0x1239 0x123C 0x1235 0x1238 0x123B

Suppose the pointer ptrT is pointing to the third element of table T,and the process is only working with the table elements from the thirdto the tenth. If the pointer in the original source code was pointing tothe element 0x1236, with the masking of the addresses this is pointingnow to element 0x123A. Given a pointer, it is always possible todetermine the starting address because the f function is a bijectionover a finite group with a number of elements equal to the length of thetable T.

In one embodiment, unmasking operation of the addresses can be doneon-the-fly during the execution time. As explained above, during thecode generation process the software tool has been used to change asimple access to a more complicated process to access to a maskedaddress based on the f function and the masterTable.

With this solution, each time a data table or a part of a data table isaccessed, the code is modified (before or during the source codecompilation) to access the right position. All this can be done withoutchanging the original source code on the developer (source code) sideand only by the dedicated tool used at or immediately before thecompilation into object code. Note also this tool can be used to modifythe static constants (data) stored in the data tables by rewriting thedata where needed.

A typical transformation (shuffling) function f for modifying theaddresses is the use of affine transformations defined modulo the numberof elements of the table applied to the position index, expressedlogically as (the multiplicative element of the affine transformationbeing co-prime with the number of elements of the table):

elementNumberindex=originalAddresse+f(index)

Another suitable such function is a permutation table, for example.

Another advantage of this solution is to provide a copy of the datatable inside another data table of the same length. This is carried outin two steps. The first one is to recopy the table from its firstelement to its end for instance. The first element in the table isaccessed as T [0] in the C computer language, for instance. Then, asimple update of the masterTable is necessary. It is also sometimesneeded to recopy a table inside another table and to complete it forfuture treatments (appending data). This can be done easily andefficiently by first recopying the existing table at the beginning (thisis not restrictive) and then completing the remaining part of the tableusing another transformation function, by updating the table only forthe appended part. This means the new table is just considered as a setof two consecutive tables with specific dedicated modified addresses.Such an operation is impossible with prior solutions. Note also thattable T can be enlarged and the added entries filled with padding(meaningless) values.

Another advantage of the present solution is the completely freepossibility for the transformation function f associated with eachpointer address and the length. This function can be quite complex, andcan be changed depending on the tables (that is, differ for each table).One can also use a specific function to assemble two data tables(concatenation). Suppose the shuffling (transformation) of two datatables T1 and T2 is affine. Then it is easy to find a function such thatthe table T1 elements are shuffled over the odd positions, whereas thetable T2 elements are shuffled over the even positions. It can, ofcourse, be more complex.

Given this description, coding the above described software tool in anyconvenient computer language would be routine, as would be combiningthat tool with conventional source code compilation. The tool is usedprior to conventional compilation, or is part of the compiler. Theresulting compiled code would then execute conventionally, resulting intransparency to the ultimate end user of the obfuscated softwareapplication, and no special software or hardware modifications areneeded to that end user's computing device.

The following shows for one embodiment variables and parameters used inthe above software tool:

T Array/Table Table/Array, most of the time referenced as T[IndexValue]ptrT pointer Pointer to the Table/Array T tLen Integer Length of theTable/Array T x integer Offset value in the Table/Array T[x] type ofarray Element x in the Table T element f function Shuffling functionflnv function Unshuffling function (related to 1) fT Array/TableTable/Array T, which has been shuffled (processed) masterTableArray/Table Table containing starting addresses of arrays and theirassociated length

FIG. 3 illustrates a typical and conventional computing system(apparatus) 60 that may be employed to implement processingfunctionality in embodiments of the invention. Computing systems of thistype may be used in a computer server or user (client) computer or othercomputing device, for example. With reference to the present inventionand present FIG. 1, this computing system may be the computer thatexecutes the software tool or the end user computer that executes theresulting compiled code. Those skilled in the relevant art will alsorecognize how to implement embodiments of the invention using othercomputer systems or architectures. Computing system 60 may represent,for example, a desktop, laptop or notebook computer, hand-held computingdevice (personal digital assistant (PDA), cell phone, palmtop, etc.),mainframe, server, client, or any other type of special or generalpurpose computing device as may be desirable or appropriate for a givenapplication or environment. Computing system 50 can include one or moreprocessors, such as a processor 64 (equivalent to processor 38 in FIG.2). Processor 64 can be implemented using a general or special purposeprocessing engine such as, for example, a microprocessor,microcontroller or other control logic. In this example, processor 64 isconnected to a bus 62 or other communications medium.

Computing system 60 can also include a main memory 58 (equivalent tomemories 52, 56, 58), such as random access memory (RAM) or otherdynamic memory, for storing information and instructions to be executedby processor 64. Main memory 68 also may be used for storing temporaryvariables or other intermediate information during execution ofinstructions to be executed by processor 64. Computing system 60 maylikewise include a read only memory (ROM) or other static storage devicecoupled to bus 62 for storing static information and instructions forprocessor 64.

Computing system 60 may also include information storage system 70,which may include, for example, a media drive 62 and a removable storageinterface 80. The media drive 72 may include a drive or other mechanismto support fixed or removable storage media, such as flash memory, ahard disk drive, a floppy disk drive, a magnetic tape drive, an opticaldisk drive, a compact disk (CD) or digital versatile disk (DVD) drive (Ror RW), flash memory, or other removable or fixed media drive. Storagemedia 78 may include, for example, a hard disk, floppy disk, magnetictape, optical disk, CD or DVD, flash memory or other fixed or removablemedium that is read by and written to by media drive 72. As theseexamples illustrate, the storage media 78 may include acomputer-readable storage medium having stored therein particularcomputer software or data.

In alternative embodiments, information storage system 70 may includeother similar components for allowing computer programs or otherinstructions or data to be loaded into computing system 60. Suchcomponents may include, for example, a removable storage unit 82 and aninterface 80, such as a program cartridge and cartridge interface, aremovable memory (for example, a flash memory or other removable memorymodule) and memory slot, and other removable storage units 82 andinterfaces 80 that allow software and data to be transferred from theremovable storage unit 78 to computing system 60.

Computing system 60 can also include a communications interface 84(equivalent to port 32 in FIG. 2). Communications interface 84 can beused to allow software and data to be transferred between computingsystem 60 and external devices. Examples of communications interface 84can include a modem, a network interface (such as an Ethernet or othernetwork interface card (NIC)), a communications port (such as forexample, a USB port), a PCMCIA slot and card, etc. Software and datatransferred via communications interface 84 are in the form of signalswhich can be electronic, electromagnetic, optical or other signalscapable of being received by communications interface 84. These signalsare provided to communications interface 84 via a channel 88. Thischannel 88 may carry signals and may be implemented using a wirelessmedium, wire or cable, fiber optics, or other communications medium.Some examples of a channel include a phone line, a cellular phone link,an RF link, a network interface, a local or wide area network, and othercommunications channels.

In this disclosure, the terms “computer program product,”“computer-readable medium” and the like may be used generally to referto media such as, for example, memory 68, storage device 78, or storageunit 82. These and other forms of computer-readable media may store oneor more instructions for use by processor 64, to cause the processor toperform specified operations. Such instructions, generally referred toas “computer program code” (which may be grouped in the form of computerprograms or other groupings), when executed, enable the computing system60 to perform functions of embodiments of the invention. Note that thecode may directly cause the processor to perform specified operations,be compiled to do so, and/or be combined with other software, hardware,and/or firmware elements (e.g., libraries for performing standardfunctions) to do so.

In an embodiment where the elements are implemented using software, thesoftware may be stored in a computer-readable medium and loaded intocomputing system 60 using, for example, removable storage drive 74,drive 72 or communications interface 84. The control logic (in thisexample, software instructions or computer program code), when executedby the processor 64, causes the processor 64 to perform the functions ofembodiments of the invention as described herein.

This disclosure is illustrative and not limiting; further modificationswill be apparent to those skilled in the art in light of this disclosureand are intended to fall within the scope of the appended claims.

1. A method of masking data in a set of computer instructions stored incomputer readable memory, using a processor coupled to the computerreadable memory, comprising the acts of: providing at least one firsttable of data in a first portion of the computer readable memory; eachtable having a length and an allocation in the first portion of thecomputer readable memory; the processor detecting a pointer to alocation in the table in the set of computer instructions; the processormodifying the detected pointer in the set of computer instructions, sothe detected pointer is modified by a transformation function; storingthe modified pointer in an entry in a second table in a second portionof the computer readable memory, the entry including the allocation andlength of the first table; and storing the set of computer instructionswith the modified pointer and the second table in a third portion of thecomputer readable memory.
 2. The method of claim 1, wherein the set ofcomputer instructions includes a second table of data and a secondpointer to the second table of data, and further comprising applying themethod to the second table of data.
 3. The method of claim 1, whereinthe transformation function is one of an affine transformation, amultiplication by a modulus of a length of the first table, or apermutation.
 4. The method of claim 1, wherein the act of modifying thedetected pointer includes invoking a handler.
 5. The method of claim 1,wherein the transformation function is bijection.
 6. The method of claim1, wherein each entry in the second table also includes thetransformation function for that pointer.
 7. The method of claim 1,wherein the pointer is an absolute or an offset pointer.
 8. The methodof claim 1, wherein the method is performed for all pointers only duringinitialization of the set of computer instructions.
 9. The method ofclaim 1, wherein the method is performed for each access to the firsttable.
 10. The method of claim 1, further comprising the acts of:providing a third table of data; and applying the transformationfunction to the first and third tables of data, thereby to concatenatethe tables.
 11. The method of claim 1, wherein additional data isappended to the first table, thereby being a third table; and furthercomprising the act of applying a transformation function only to theadditional data in the third table.
 12. The method of claim 1, whereinthe set of computer instructions is in source code format prior toapplication of the method, and the method is carried out before orduring compilation of the source code into object code.
 13. The methodof claim 1, wherein the transformation function is included in thesecond table.
 14. The method of claim 1, further comprising increasingthe length of the first table and putting padding values in additionalentries of the first table.
 15. A computing device programmed to performthe method of claim
 1. 16. A computer program product storing computercode to carry out the method of claim
 1. 17. A computer readable memorystoring the set of computer instructions resulting from the method ofclaim
 1. 18. The method of claim 1, wherein the set of computerinstructions resulting from the method is object code.
 19. A method ofaccessing masked data in a set of computer instructions stored in acomputer readable memory, using a processor coupled to the computerreadable memory, comprising the acts of: the processor detecting apointer to a table of masked data in the set of computer instructionsstored in a first portion of the computer readable memory; upondetecting the pointer, the processor accessing a second table stored ina second portion of the computer readable memory, the second tableincluding a plurality of entries, each entry corresponding to a table ofdata and having a length and starting address of the correspondingmasked table of data and a transformation function; the processormodifying the masked table of data pointed to by the detected pointer,so the masked table of data is modified by the transformation functionso as to be unmasked; and storing the unmasked table of data with theset of computer instructions in the computer readable memory.
 20. Themethod of claim 19, wherein the set of computer instructions includes asecond table of data and a second pointer to the second table of data,and further comprising applying the method to the second table of data.21. The method of claim 19, wherein the transformation function is oneof an affine transformation, a multiplication by a modulus of a lengthof the first table, or a permutation.
 22. The method of claim 19,wherein the act of modifying the detected pointer includes invoking ahandler.
 23. The method of claim 19, wherein the transformation functionis a bijection.
 24. The method of claim 19, wherein each entry in thesecond table also includes the transformation function for that pointer.25. The method of claim 19, wherein the pointer is an absolute or anoffset pointer.
 26. The method of claim 19, wherein the method isperformed for all pointers only during initialization of the set ofcomputer instructions.
 27. The method of claim 19, wherein the method isperformed for each access to the first table.
 28. The method of claim19, further comprising the acts of: providing a third table of data; andapplying the transformation function to the first and third tables ofdata, thereby to de-concatenate the tables.
 29. The method of claim 19,wherein additional data is appended to the first table, thereby being athird table; and further comprising the act of applying a transformationfunction only to the additional data in the third table.
 30. The methodof claim 19, wherein the transformation function is included in thesecond table.
 31. The method of claim 19, wherein predetermined entriesof the second table contain padding values.
 32. A computing deviceprogrammed to perform the method of claim
 19. 33. A computer programproduct storing computer code to carry out the method of claim
 19. 34. Acomputer readable memory storing the set of computer instructionsresulting from the method of claim
 19. 35. The method of claim 19,wherein the set of computer instructions resulting from the method isobject code.